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i)  The  promoter  recognition  signals  in  halophilic  archaebacteria  consist  of  two  sequence 
elements  centered  about  -30  and  -40  nucleotides  upstream  of  the  transcription  initiation  site.  The 
consensus  for  these  sequences  are  TTAA  (-30)  and  TTCGA  (-40).  Transcription  termination 
signals  are  tracts  of  T  residues  in  the  (+  )  strand  of  the  template  DNA  and  are  often  preceded  by  a 
G  +  C  rich  sequence,  sometimes  possessing  inverted  repeat  symmetry.  The  transcripts  of  protein 
encoding  genes  are  generally  monocistronic,  initiated  close  to  the  ATG  translation  initiation 
codon,  and  noticeably  lack  the  Shine-Dalgamo  ribosome  binding  sequence. 

ii)  Ribosomal  RNA  operons  are  transcribed  from  one  up  to  nine  tandem  promoters  in  the  5' 
flanking  region.  The  initial  step  in  processing  precursor  16S  and  23S  from  the  primary  transcript 
involves  cleavage  by  an  endonuclease  that  is  also  used  to  excise  the  intron  from  certain  tRNA 
gene  transcripts.  The  primary  sequence  and  secondary  structure  of  the  cleavage  site  is  highly 
conserved  among  archaebacteria. 

iii)  Ribosomal  protein  genes  are  organized  into  eubacterial-like  operons.  The  Lle-L10e-L12e 
halophilic  operon  contains  a  putative  regulatory  sequence  in  the  5'  transcribed  leader  that  could 
function  in  autogenous  translational  control. 

iv)  Archaebacterial  ribosomal  proteins  exhibit  substantial  structural  and  sequence  similarity  to 
eucaryotic  ribosomal  proteins  and  less  similarity  to  the  eubacterial  equivalent  ribosomal  proteins. 

v)  Halophilic  archaebacteria  contain  a  Mn  containing  superoxide  dismutase  that  is 
homologous  to  the  eubacterial  Mn  or  Fe  SOD  and  unrelated  to  the  eucaryotic  CuZn  enzyme.  The 
sod,  gene  and  a  related  gene,  slg,  have  been  cloned  and  sequenced  from  H.  cutirubrum  and  their 
regulation  and  expression  has  been  characterized.  These  two  sequences  represent  a  unique 
example  of  divergent  evolution  driven  by  selection  at  the  molecular  level  of  a  duplicated 
sequence  within  a  genome. 

Abbreviations: 

Hcu  -  Halobacterium  cutirubrum 

Hha  -  Halobacterivun  halobium  (same  species  as  Hcu) 

Hvo  -  Halobacterium  volcanii 

Hma  -  Halobacterium  marismortui 
Sso  -  Sulfolobus  solfataricus 

Eco  -  Escherichia  coli 

See  -  Saccharomyces  cerevisiae 

Research  Objectives: 

* 

i)  To  characterize  the  principles  of  gene  organization  and  regulation  of  gene  expression  in 
archaebacteria. 

ii)  To  elucidate  the  evolutionary  relationship  between  these  novel  organisms  and  the 
traditional  eubacterial  and  eucaryotic  organisms. 

iii)  To  understand  in  biophysical  and  molecular  terms  some  of  the  mechanisms  that  allow 
archaebacteria  to  inhabit  extreme  environments. 
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Accomplishments: 

Ribosomal  protein  genes:  A  5.2  Kbp  Clal-BamHl  fragment  of  Hcu  genomic  DNA  was 
cloned  using  synthetic  oligonucleotides  complementary  to  the  coding  regions  of  partially 
sequenced  Hcu  505  subunit  ribosomal  proteins.  The  fragment  was  completely  sequenced  and 
found  to  contain  the  genes  encoding  the  proteins  equiv^ent  to  the  Eco  Lll,  LI,  LIO  and  L12 
ribosomal  proteins.  In  addition,  two  open  reading  frames  designated  ORF  and  NAB  were  also 
detected.  The  transcripts  from  this  region  were  extensively  characterized  by  Northern 
hybridization,  SI  nuclease  protection  and  primer  extension  analysis.  Four  promoters  were 
located  in  front  of  ORF,  NAB,  Llle  and  Lie  genes,  respectively.  The  first  three  initiate 
transcription  at  or  adjacent  to  the  respective  ATG  translation  initiation  codons  whereas  the 
fourth  initiates  to  produce  a  transcript 'with  a  75  nucleotide  long  untranslated  leader  sequence. 
Preceding  each  transcription  initiation  site  are  two  motifs  that  appear  to  be  conserved;  they  are 
the  general  archaebacterial  motif  TTAA  centered  at  about  position  -30  and  a  halophile  specific 
motif  TTCGA  centered  at  about  position  -40.  Transcription  termination  occurs  at  poly  T  tracts  in 
the  DNA  (+)  strand  that  are  often  preceded  by  GC  rich  sequences  sometimes  containing  inverted 
repeat  symmetry.  Within  the  coding  regions,  poly  T  sequences  are  noticeably  under  represented 
and  the  11 1  Phe  codon  is  not  used. 

The  tricistronic  mRNA  encodes  the  Lie,  LlOe  and  L12e  ribosomal  proteins  that  are 
produced  in  a  1:1:4  stoichiometry.  Each  of  these  cistrons  is  preceded  by  what  appears  to  be  the 
equivalent  of  the  eubacterial  Shine-Dalgarno  ribosome  recognition  sequence.  Monocistronic 
mRNAs  lack  these  sequences  and  may  initiate  translation  by  a  eucaryotic  thread  on  type 
mechanism.  The  Lie  binding  sequence  on  235  rRNA  has  been  defined  previously.  We 
identified  a  region  in  the  5'  untranslated  leader  to  the  Lle-L10e-L12e  mRNA  that  resembles  in 
primary  sequence  and  secondary  structure  this  rRNA  binding  site.  We  propose  that  this  mRNA 
leader  sequence  is  used  to  autogenously  regulate  translation  of  the  mRNA  by  a  mechanism 
similar  to  that  employed  in  eubacteria.  Thus  halophilic  archaebacteria  retain  the  same  gene 
order  and  possibly  also  the  same  regulatory  mechanism  for  controlling  ribosomal  protein 
synthesis  that  is  found  in  eubacteria. 

Ribosomal  protein  structure:  The  complete  amino  acid  sequences  of  the  Llle,  Lie,  LlOe 
and  L12e  ribosomal  proteins  from  Hcu  and  5so  have  been  deduced  from  the  respective  gene 
sequences  and  compared  to  the  amino  add  sequences  of  the  respective  eucaryotic  and  eubacterial 
equivalent  ribosomal  proteins.  On  the  large  ribosome  subunit,  the  1:4  complex  of  LlOe  and  L12e 
in  loose  association  with  Llle  binds  to  235  rRNA  at  about  nucleotides  1050-1120  to  form  a 
prominent  stalk  structure.  This  structure  functions  as  a  factor  binding  domain  with  associated 
GTPase  activities  during  the  protein  synthesis  cycle.  The  Lie  protein  binds  to  23S  rRNA  near 
nucleotide  2150  and  forms  a  prominent  ridge  on  the  large  subunit  opposite  the  stalk  structure. 
The  Lie  protein  stabilizes  peptidyl  tRNA  binding  to  the  P  site  and  stimulates  GTPase  activity  at 
the  factor  binding  domain.  Alignments  of  amino  acid  sequences  of  these  four  proteins  from 
eubacteria,  eucaryotes  and  archaebacteria  indicate  that  (i)  archaebacteria  represent  a  distinct 
phylogenetic  group  and  (ii)  in  general,  the  archaebacterial  proteins  more  closely  resemble  their 
eucaryotic  rather  than  the  eubacterial  equivalent  proteins.  5imilarities  between  the  Llle,  Lie, 
LlOe  and  L12e  proteins  of  Eco,  Hcu,  5so  and  5ce  are  presented  in  the  accompanying  table. 

Our  work  has  provided  some  valuable  insights  into  the  structure  and  evolution  of  the 
L10e-L12e  complex  and  has  the  important  potential  to  help  us  identify  functional  domains 
within  the  respective  proteins.  Eucaryotes  including  5ce  possess  two  different  L12e  genes 
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designated  type  I  and  II;  in  See,  each  of  these  is  again  duplicated  to  give  four  L12e  genes 
designated  lA,  IB,  IIA  and  IIB  genes.  Within  the  universally  shared  factor  binding  domain  of  all 
L12e  proteins  (from  eubacteria,  archaebacteria  and  eucaryotes),  the  type  I  versus  type  n  split 
appears  to  be  the  most  andent.  This  implies  that  the  common  ancestor  contained  two  L12e  genes 
and  that  one  of  these  (probably  the  type  n  gene)  was  lost  from  the  eubacterial  and  archaebacterial 
lines.  Although  the  factor  binding  domain  remains  a  constant  feattue  of  all  L12e  proteins,  the 
eubacterial  protein  dearly  differs  in  its  overall  structure  from  the  archaebacterial  and  eucaryotic 
equivalents.  In  the  Eco  L12,  the  factor  binding  domain  is  shifted  to  the  carboxy  terminus  of  the 
protein  and  the  amino  terminus  appears  to  have  originated  by  a  partial  duplication  event.  The 
archaebacterial-eucaryotic  L12e  proteins  have  the  factor  binding  domain  at  the  amino  terminus 
and  a  unique  charged  region  at  the  carboxy  terminus  that  is  not  present  in  the  eubacterial  protein. 
Furthermore,  a  unique  gene  fusion  placing 
the  distal  75%  of  the  L12e  gene  at  the  3'  end 


of  the  LlOe  gene  is  evident  in  both 
archaebacteria  and  eucaryotes.  This  domain 
in  the  two  proteins  is  highly  conserved, 
implying  a  strong  constraint  on  sequence 
divergence.  In  eubacteria,  most  of  this  fusion 
sequence  has  been  removed  from  the 
contemporary  LIO  gene  by  subsequent 
deletion  and  there  is  virtually  no  homology 
left  from  the  andent  fusion  events.  These 
and  other  detailed  analyses  of  the  structtire, 
function  and  evolution  of  the  domains  of 
the  Llle,  Lie,  LlOe  and  L12e  proteins  is 
continuing. 

Superoxide  dismutase  in  the 
archaebacteria:  When  life  originated  about 
3.5  X  10^  years  ago,  the  earth’s  atmosphere 
was  anaerobic.  Oxygen  began  to  accumulate; 
in  the  atmosphere  about  2  x  10^  years  ago  as  a 
by-product  of  the  photosynthetic  process. 
Molecular  oxygen  is  toxic  to  living  organisms 
because  of  the  reactivity  of  the  superoxide 
anion.  The  enzyme  superoxide  dismutase 
carries  out  the  dismutation  of  superoxide  to 
molecular  oxygen  and  peroxide.  Two 
separate  SOD  activities  have  evolved 
independently  during  evolution,  the  CuZn 
enzyme  of  eucaryotes  and  the  Mn  or  Fe 
enzyme  of  eubacteria  (and  eucaryotic 
organelles).  We  have  shown  that  the 
halophilic  SOD  is  homologous  to  the 
eubaderied  enzyme.  The  enzyme  activity  has 
been  purified  to  near  homogeneity  from  Heu 
and  the  gene  encoding  the  activity  has  been 
cloned,  sequenced  and  its  activity  has  been 
characterized. 


Table  2.  Similarities  between  the  LI  Ic,  Lie,  LlOe,  and 
L12e  ribosomal  proteins  of  E.  coU,  H.  cuiimbrum, 
S.  solfaxaricus^  and  5.  cerevisiae 


Length 

Identities 

Gaps 

Llle 

Heu  /  Sso 

164 

65  (40%) 

1 

Heu /Eco 

138 

46  (33%) 

1 

Sso /Eco 

139 

45  p2%) 

2 

Lie 

Heu /Sso 

214 

66  (31%) 

4 

Heu /Eco 

211 

58  (27%) 

12 

Sso/Eco 

220 

49  (22%) 

10 

LlOe 

Heu  /  Sso 

343 

90  (26%) 

1 

Heu  /  Eco 

169 

40  (24%) 

6 

Sso/Eco 

169 

35  (21%) 

6 

Heu /See 

329 

76  (23%) 

7 

Sso/ See 

322 

72  (23%) 

7 

Eco/Scc 

163 

27  (17%) 

6 

L12e 

Heu /Sso 

110 

46  (42%) 

1 

L12e 

Globular  Dotnaio 

Heu /Sso 

56 

28  (50%) 

0 

Heu /Eco 

44 

13  (30%) 

7 

Sso/Eco 

44 

12  (27%) 

7 

Heu/Sce  lA 

57 

16  (28%) 

2 

/See  IB 

57 

18  (32%) 

2 

/See  IIA 

54 

9  (17%) 

6 

/Sec  IIB 

54 

16  (30%) 

6 

Sso/Sce  lA 

57 

18  (32%) 

2 

/Sec  IB 

57 

17  (30%) 

2 

/See  IIA 

54 

12  (22%) 

6 

/Sec  IIB 

54 

17  (31%) 

6 

See  lA/Scc  IB 

58 

32  (55%) 

2 

/See  HA 

55 

8  (15%) 

6 

/Sa  IIB 

55 

14  (25%) 

6 

See  IB /See  IIA 

54 

9  (17%) 

6 

/See  IIB 

54 

il  (20%) 

6 

See  ilA/Sce  IIB 

52 

27  (52%) 

0 

Non:  The  tvenge  leagth  in  tmino  acids  over  the  region  of  compnn* 
son  of  the  two  sequences  a  determtned  as  the  length  of  the  comparuon 
region  minus  half  of  the  tocii  gaps.  Identities  are  given  as  the  number 
(and  percenuge)  of  perfect  matches  over  the  region  of  cotnpanson. 
Gaps  arc  given  as  the  number  of  deletions  (or  insertions!  required  to 
achieve  alignment  of  the  f\vo  sequences  within  the  region  of  comptnsoo. 
The  LI2e  globular  domain  is  the  region  indicated  in  Fig  4B  In  Heu  and 
Sso  LI  2e  this  fccion  cnends  from  posirion  I  to  56  o(  Tip  4 A  Compan- 
vons  v,iih  (.CO  L12  are  wiih  ifx^  C  domain  and  tw  it»e  N-ierrmnus  in 
Fig  4B 


In  addition  to  the  sod  gene  which  encodes  the  authentic  SOD  activity  that  has  been 
purified,  we  cloned  a  second  gene  designated  slg  from  the  Hcu  genome  which  is  highly  similar  to 
sod.  This  slg  gene  is  actively  transcribed;  no  sod-like  activity  corresponding  to  the  product  of  the 
slg  gene  has  been  detected;  the  transcriptional  regulation  of  slg  is  different  from  sod  emd  the  5' 
and  3'  flanking  regions  are  unrelated.  Within  the  coding  region,  the  four  codons  used  to  specify 
the  amino  acid  residues  used  to  bind  Mn  in  the  protein  are  conserved  in  both  genes.  The  genes 
have  87%  nucleotide  sequence  identity  whereas  the  proteins  they  encode  have  only  83%  amino 
acid  sequence  identity.  Mutations  occur  randomly  at  first,  second  and  third  codon  positions  and 
transversions  outnumber  transitions.  Most  mutational  differences  between  ^  and  sod  are 
confined  to  two  limited  regions;  other  regions  totally  lack  differences.  These  two  gene  sequences 
are  apparently  in  the  initial  stage  of  an  unusual  mode  of  divergent  evolution.  Presumably,  this 
divergence  is  being  driven  by  strong  selection  at  the  molecular  level  for  either  acquisition  of  new 
or  partition  and  refinement  of  ancestral  functions  in  one  or  both  of  the  respective  gene  products. 

An  attempt  to  clarify  the  intriguing  sod-slg  relationship  in  Hcu  was  made  by  examining 
the  homologous  sequences  from  the  related  halophile  Hvo.  This  organism  also  has  two  copies  of 
a  sod  related  sequence;  amazingly  both  copies  are  nearly  identical,  differing  only  at  codons  two 
and  three  as  follows: 


codon 

1 

2 

3 

4 

199 

200 

amino  acid 

n 

S 

D 

V 

F 

E 

TER 

sod  1 

.  .  .  flflCflCCTTflC 

RTG 

TCfl 

GfiC 

TflC 

.  ,  .TTC 

GflC 

TRR 

CGCGTRCGC 

•••  ••••• 

•  •• 

••• 

sod  2 

. . .  GTTflCfiCRTT 

flTG 

flGC 

— 

TflC 

.  .  .TTC 

GRG 

TRR 

CCGGRTCRT 

n 

S 

V 

F 

E 

TER 

The  remaining  197  codons  are  identical  and  homology  abruptly  ends  at  the  cistron  boundaries. 
We  suspect  that  the  duplication  of  the  primordial  sod  gene  was  an  ancient  event  and  that  in  Hcu 
selection  at  the  molecular  level  is  operating  to  produce  two  proteins  with  different  (although 
probably  related)  function  whereas  in  Hvo  selection  is  operating  to  conserve  structure  and 
function.  We  are  beginning  to  exploit  the  Hvo  DNA  transformation  system  to  address  questions 
relating  to  the  function,  regulation  and  evolution  of  the  sod  genes  in  halophilic  archaebacteria 
halophilic  rRNA  operons. 

Ribosomal  RNA  genes:  In  halophilic  archaebacteria,  rRNA  operons  have  the  following 
gene  order:  16S,  23S  and  5S.  An  ala  tRNA  gene  is  located  in  the  16S-25S  intergenic  space  and  a 
cys  tRNA  is  sometimes  positioned  distal  to  the  5S  gene.  The  5'  flanking  region  contains  from 
one  up  to  nine  tandemly  repeated  promoter  sequences  each  containing  the  TTAA  (-30)  and 
TTCGA  (-40)  conserved  sequences.  The  16S  and  23S  genes  are  flarked  by  inverted  repeat 
processing  sequences  that  are  highly  conserved  in  both  primary  sequence  and  secondary 
structure.  Excision  at  these  sites  liberates  precursor  16S  and  23S  rRNAs  from  the  primary 
transcript;  it  appears  likely  that  the  endonuclease  responsible  is  identical  to  enzymes  used  to 
excise  introns  from  the  23S  rRNA  genes. 

During  a  comparison  of  leader  sequences  from  Hcu,  Hvo  and  Hha,  we  have  observed  an 
80  nucleotide  long  region  that  immediately  precedes  the  i6S  processing  site  that  is  more  highly 
conserved  than  is  the  16S  gene  sequence  between  these  organisms.  Only  the  last  few  nucleotides 
of  this  sequence  are  required  for  recognition  by  the  endonuclease.  The  function  of  the  remainder 
of  this  conserved  sequence  is  being  investigated. 
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The  species  Hma  has  two  rRNA  operons,  designated  HC8  and  HHIO.  The  first  is 
preceded  by  four  tandem  promoters  whereas  the  second  is  preceded  by  only  a  single  promoter.  In 
addition,  the  HHIO  operon  appears  to  lack  the  normal  16S  rRNA  processing  sequence.  We  have 
begun  a  complete  sequence  comparison  of  these  two  operons.  Remarkably,  our  results  indicate 
extensive  differences  between  the  two  16S  coding  regions.  These  include  base  substitutions  as 
well  as  insertion-deletion.  In  virtually  all  organisms,  the  sequence  of  multiple  rRNA  genes  are 
virtually  identical.  The  functional  significance  of  this  sequence  variation  in  the  two  Hha 
operons  is  currently  under  investigation. 
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